2020-01-16

Can open source tools be a true alternative for many applications?

  • software development companys like windows, apple … graphpad, SPSS… produce usable, mostly good software

  • but:

    • products are getting increasingly expansive and update/support dependent
    • the users loose their independence and individuality
    • users can only to a certain degree personalize, optimize, adapt the applications for their needs
    • most of those products can lead to results, but they are not really efficent nor powerful nor leading edge
    • product placement strategies of those companys are unmatched –> users are more and more degraded to dependent consumers, than rather beeing their own developers



Why not use open source software tools, with millions of active developers, which can even be much more powerful, efficent and elegant than commercial products?

DevOps, modern (software) development practices

  • guarantee version control, reproducibility, collaboration and efficient workflows

  • automate everything, that can be automated

  • after doing the exact same operation more than three times, write a function for it, automate it

data science

  • much of the research in healthcare, in medicine is data science

  • successful biostatisticans and developers tackle their problems in a data science way

  • data science provides thousands of modelling techniques, algorithms, version control systems and workflows that can be implemented in healthcare/biomedical research

data science core toolbox

  • founded 1992
  • initially only as a terminal interface == a common line interface
  • initially as a statistical software
  • R developed rapidly and is currently, together with python one of the fastest growing programming languages
  • for computational intensive tasks the high level programming languages like C++ or Fortran are used
  • millions of contributers
  • 15000 packages

  • to call some repositories: Bioconductor, CRAN, GitHub


graphical user interface, integrated development environment


  • many different addins (for example to manipulate data in a spreadsheet(excel) style, to insert citations, pictures, edit tables, plotting etc.)

  • options for setting up connections to databases, servers, git repositories, internal html viewer etc.

data science core packages for R

tidyverse_packages()
 [1] "broom"       "cli"         "crayon"      "dplyr"       "dbplyr"     
 [6] "forcats"     "ggplot2"     "haven"       "hms"         "httr"       
[11] "jsonlite"    "lubridate"   "magrittr"    "modelr"      "purrr"      
[16] "readr"       "readxl\n(>=" "reprex"      "rlang"       "rstudioapi" 
[21] "rvest"       "stringr"     "tibble"      "tidyr"       "xml2"       
[26] "tidyverse"  

Overview

  • tidy data

  • manipulate data

  • The Grammar of Graphics

  • create reports, articles…

  • write books

  • create Web Apps

  • functional programming

R Markdown and inline code

#`r......`



#Fünf plus drei ist `r 5+3`


Fünf plus drei ist 8.

       name Geschlecht       date   location            reason
Peter Peter          m 11.09.2020    Dresden      Dampferfahrt
Jonas Jonas          m 12.09.2020     Meißen Stadtbesichtigung
Anna   Anna          w 13.09.2020 Moritzburg Gesellschaftsjagd


Name <- "Peter"
 
# Lieber `r greetings[Name,1]`,
# Wollen wir uns am `r greetings[Name,3]` in `r greetings[Name,4]` ...

Lieber Peter,
wollen wir uns am 11.09.2020 in Dresden treffen und eine Dampferfahrt unternehmen? Diese wird etwa 3 Stunden dauern.

Name <- "Jonas"

Lieber Jonas,
wollen wir uns am 12.09.2020 in Meißen treffen und eine Stadtbesichtigung unternehmen? Diese wird etwa 3 Stunden dauern.

library(broom)
library(DT)
options(digits = 4)
x <- cor.test(diamonds$carat, diamonds$price, 
              method = "pearson", 
              alternative = "two.sided")
x <- tidy(x)

datatable(x, editable = "cell", options = list(scrollX = TRUE))


#Die Korrelation zwischen Diamanten Preis und Karat
#beträgt `r cor(diamonds$carat, diamonds$price, method = "pearson")`
#(95% CI = `r x$conf.low`-`r x$conf.high`). 

Die Korrelation zwischen Diamanten Preis und Karat beträgt 0.9216 (95% CI = 0.9203-0.9229).

git

  • free and opensource distributed version control system (DVCS)

  • command-line interface (for example git bash): automatable, efficient

  • GUI (for example SourceTree client): for some tasks easier to use

  • What is version control:

    • complete history tracked and available

    • team work & many workflows supported

    • local and/or remote repositories

    • efficiency through team communication, easy syncs and consistency

Why does reproducibility matter?

Links for learning R in a data science way

Links for learing git & GitHub version control

Some basic Rmarkdown Syntax

- item 1 
- item 2 
- item 3 
  • item 1
  • item 2
  • item 3
1. item 1 
2. item 2 
3. item 3 
  1. item 1
  2. item 2
  3. item 3

Citation

  • For Citation it is recommended to use the BibTex databases because they work best with LaTEX/PDF outputs.

  • A BibTeX database is a plain text file with the filename extension .bib.

  • One can use the open source Zotero to manage .bib files directly form a internet browser.

  • One can use the Rstudio addin citr to insert citations.

#[@lohaus_hpv16_2014]

(Lohaus et al. 2014)

bibliography

Lohaus, Fabian, Annett Linge, Inge Tinhofer, Volker Budach, Eleni Gkika, Martin Stuschke, Panagiotis Balermpas, et al. 2014. “HPV16 DNA Status Is a Strong Prognosticator of Loco-Regional Control After Postoperative Radiochemotherapy of Locally Advanced Oropharyngeal Carcinoma: Results from a Multicentre Explorative Study of the German Cancer Consortium Radiation Oncology Group (DKTK-ROG).” Radiotherapy and Oncology: Journal of the European Society for Therapeutic Radiology and Oncology 113 (3): 317–23. https://doi.org/10.1016/j.radonc.2014.11.011.